Accelerated Gradient Descent by Factor-Centering Decomposition
نویسنده
چکیده
Gradient factor centering is a new methodology for decomposing neural networks into biased and centered subnets which are then trained in parallel. The decomposition can be applied to any pattern-dependent factor in the network’s gradient, and is designed such that the subnets are more amenable to optimization by gradient descent than the original network: biased subnets because of their simplified architecture, centered subnets due to a modified gradient that improves conditioning. The architectural and algorithmic modifications mandated by this approach include both familiar and novel elements, often in prescribed combinations. The framework suggests for instance that shortcut connections — a well-known architectural feature — should work best in conjunction with slope centering, a new technique described herein. Our benchmark experiments bear out this prediction, and show that factorcentering decomposition can speed up learning significantly without adversely affecting the trained network’s generalization ability.
منابع مشابه
Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling
Accelerated coordinate descent is widely used in optimization due to its cheap per-iteration cost and scalability to large-scale problems. Up to a primal-dual transformation, it is also the same as accelerated stochastic gradient descent that is one of the central methods used in machine learning. In this paper, we improve the best known running time of accelerated coordinate descent by a facto...
متن کاملScaling up Natural Gradient by Sparsely Factorizing the Inverse Fisher Matrix
Second-order optimization methods, such as natural gradient, are difficult to apply to highdimensional problems, because they require approximately solving large linear systems. We present FActorized Natural Gradient (FANG), an approximation to natural gradient descent where the Fisher matrix is approximated with a Gaussian graphical model whose precision matrix can be computed efficiently. We ...
متن کاملA geometric alternative to Nesterov's accelerated gradient descent
We propose a new method for unconstrained optimization of a smooth and strongly convex function, which attains the optimal rate of convergence of Nesterov’s accelerated gradient descent. The new algorithm has a simple geometric interpretation, loosely inspired by the ellipsoid method. We provide some numerical evidence that the new method can be superior to Nesterov’s accelerated gradient descent.
متن کاملAn eigenvalue study on the sufficient descent property of a modified Polak-Ribière-Polyak conjugate gradient method
Based on an eigenvalue analysis, a new proof for the sufficient descent property of the modified Polak-Ribière-Polyak conjugate gradient method proposed by Yu et al. is presented.
متن کاملConditional Accelerated Lazy Stochastic Gradient Descent
In this work we introduce a conditional accelerated lazy stochastic gradient descent algorithm with optimal number of calls to a stochastic first-order oracle and convergence rate O( 1 ε2 ) improving over the projection-free, Online Frank-Wolfe based stochastic gradient descent of Hazan and Kale [2012] with convergence rate O( 1 ε4 ).
متن کامل